-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring tutorial #440
base: dev
Are you sure you want to change the base?
Refactoring tutorial #440
Conversation
This PR includes documentation updates New pages: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very valuable addition to the docs.
Just a few comments
|
||
The answer for these issues is, therefore, to refactor the property `languages` into a node and connect it to the `Movie` nodes with a new relationship. | ||
|
||
== Eliminating duplicated data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically we're not deleting data in this section
rename to "reshaping the data"?
* *In order to perform the query, all `Movie` nodes must be retrieved* -> As the graph scales, the performance of a similar query can be dimished by the way you modeled your data. | ||
* *The name of the language is duplicated in many `Movie` nodes (in this case, all of them)* -> If many nodes share a same property value, it could be a sign that this property value could instead become a new entity, like a node or a relationship, for example. | ||
|
||
The answer for these issues is, therefore, to refactor the property `languages` into a node and connect it to the `Movie` nodes with a new relationship. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about a visual to represent the data model before/after?
[source,cypher] | ||
-- | ||
MATCH (m:Movie) | ||
MATCH (l:Language) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- this does not work, there is a where clause missing to take only the language nodes matching the movie languages
- this will work only in small graphs, mid-size graphs and above require sub-transactions
- a unique constraint is missing on the language identifier
- In practice we would do all of this in a single query for efficiency
CREATE CONSTRAINT unique_language FOR (n:Language) REQUIRE n.name IS UNIQUE
Note: untested query
:auto <1>
MATCH (m:Movie)
CALL (m) {
UNWIND m.languages AS language
MERGE (l:Language {name:language})
MERGE (m)-[:IN_LANGUAGE]->(l)
SET m.languages = null
} IN TRANSACTIONS OF 10000 rows
<1> required in neo4j browser to run nested transactions
One way to improve your current model is to check for duplicate key values and see if you can turn them into another entity, like a node or a relationship. | ||
In this case, both production companies are based in California, so the state could be turned into a node for `State` and be connected to the producer companies via a new relationship `LOCATED_AT`: | ||
|
||
image::california.svg[The producer company nodes now have one less property for state and connect to a state node for California, role=popup] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For data consistency, the country
property should also move to the State
nodes
RETURN m | ||
-- | ||
|
||
| How many users rated a movie? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more 'graphy' alternative could be
MATCH (m:Movie)
WHERE m.title = 'Apollo 13'
RETURN COUNT {(:User)-[:RATED]->(m)} AS `Number of reviewers`
|
||
This should be the result: | ||
|
||
image::query-plan.png[Screenshot of Browser featuring a query plan that shows the number of database hits when you retrieve all person nodes,400,400,role=popup] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the data on the screenshot looks suspicious
I would not expect to have 0 rows
in the execution pipeline, unless the DB is empty
No description provided.